Address-free memory access based on program syntax correlation of loads and stores
نویسندگان
چکیده
An increasing cache latency in next-generation processors incurs profound performance impacts in spite of advanced out-of-order execution techniques. One way to circumvent this cache latency problem is to predict load values at the onset of pipeline execution by exploiting either the load value locality or the address correlation of stores and loads. In this paper, we describe a new load value speculation mechanism based on the program syntax correlation of stores and loads. We establish a symbolic cache (SC), which is accessed in early pipeline stages to achieve a zero-cycle load. Instead of using memory addresses, the SC is accessed by the encoding bits of base register ID plus the displacement directly from the instruction code. Performance evaluations using SPEC95 and SPEC2000 integer programs on SimpleScalar simulation tools show that the SC achieves higher prediction accuracy in comparison with other load value speculation methods, especially when hardware resources are limited.
منابع مشابه
Symbolic Cache: Fast Memory Access Based on Program Syntax Correlation of Loads and Stores
An increasing cache latency in next-generation processors incurs profound performance impacts in spite of advanced out-of-order execution techniques. One way to circumvent this cache latency problem is to predict load values at the onset of pipeline execution by exploiting either the load value locality or the address correlation of stores and loads. In this paper, we describe a new load value ...
متن کاملSpecial section on the 2001 International Conference on Computer Design (ICCD)
encompasses a wide range of topics in the research, design, and implementation of computer systems and their components. ICCDs unique multidisciplinary emphasis provides an ideal environment for developers and researchers to discuss practical and theoretical work covering system and processor architecture , logic, and circuit design, verification and test methods along with tools and methodolog...
متن کاملA Section Based Program Analysis to Reduce Overhead of Detecting Unsynchronized Thread Communication
Most systems that test and verify parallel programs, such as deterministic execution engines, data race detectors and software transactional memory systems, require instrumenting loads and stores in an application. This can cause a very significant runtime and memory overhead compared to executing uninstrumented code. Multithreaded programming typically allows any thread to perform loads and st...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. VLSI Syst.
دوره 11 شماره
صفحات -
تاریخ انتشار 2003